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The proximal promoter r^Ton of the human pituitary expressed growmWmone (GUI) 
gene is highly polymorphic, containing at least 15 single nucleotide polymorphisms 
(SNPs). This variation is manifest in 40 different haplotypes, the high diversity being 
explicable in terms of gene conversion, recurrent mutation and selection. Functional 
analysis showed that 12 haplotypes were associated with a significantly reduced level of 
reporter gene expression whilst 10 haplotypes were associated with a significantly 
increased level. The former occur more frequently in the general population than the 
latter. Although individual SNPs contributed to promoter strength in a highly interactive 
and non-additive fashion, haplotype partitioning identified six SNPs as major 
determinants of GH1 gene expression. The prediction and functional testing of hitherto 
unobserved super-maximal and sub-minimal promoter haplotypes was then used to test 
the efficacy of the haplotype partitioning approach. Mobility shift assays demonstrated 
that five SNP sites exhibit allele-specific protein binding. An association was noted 
between adult height and the mean in vitro expression value corresponding to an 
individual's GH1 promoter haplotype combination (P=0.028); however, only 2.5% of 
the variance of adult height was found to be explicable by reference to this parameter. 
Three additional SNPs, identified within sites I and II of the upstream Locus Control 
Region (LCR), were ascribed to three distinct haplotypes. A series of LCR-GiZZ 
proximal promoter constructs were used to demonstrate that (i) the LCR enhanced 
proximal promoter activity by up to 2.8-fold depending upon proximal promoter 
haplotype and that (ii) the activity of a given proximal promoter haplotype was also 
differentially enhanced by different LCR haplotypes. The genetic basis of inter- 
individual differences in GH1 gene expression thus appears to be extremely complex. 



• • • 

Introduction 

Human stature is a highly complex trait resulting from the interaction of multiple 
genetic and environmental factors. Since familial short stature is already known to. be 
associated with inherited mutations of the growth hormone (GH1) gene (Procter et al. 
1998), it appears reasonable to suppose mat polymorphic variation in this pituitary- 
expressed gene can also influence adult height 

The human GH1 gene is located on chromosome 17q23 within a 66 kb cluster of five 
related genes (Chen et al. 1989) including the placentally expressed growth hormone 
gene (GH2), two chorionic somatomammotropin genes (CSH1 and CSH2) and a 
pseudogene (CSHP1). The proximal region of the GH1 gene promoter exhibits a high 
level of sequence variation with 16 single nucleotide polymorphisms (SNPs) reported 
within a 535 base-pair stretch (Giordano et al. 1997; Wagner et al. 1997). The majority 
of these SNPs occur at the same positions in which the GH1 gene differs from the 
paralogous GH2, CSH1, CSH2 and CSHP1 genes, suggesting that they may have arisen 
through gene conversion (Giordano et al. 1997; Krawczak et al. 1999). 

The expression of the human GH1 gene is also influenced by a Locus Control 
Region (LCR) located between 14.5 kb and 32 kb upstream of the GH1 gene (Jones et 
al. 1995). The LCR contains multiple DNase I hypersensitive sites and is required for 
the activation of the genes of the GH gene cluster in both pituitary and placenta (Su et 
al. 2000; Ho et al. 2002). Two DNase I hypersensitive sites (I and II) contain binding 
sites for the pituitary-specific transcription factor Pit-1 and are responsible for the high 
level-, somatotrope-specific expression of the GH1 gene (Shewchuk et al. 1999). In an 
attempt to identify common genetic factors that might play a role in determining human 
stature, we have used in vitro reporter gene and mobility shift assays to assess the 



relative importance of polymorphic variation in both the proximal promoter region and 
the LCR on GUI gene expression. 



Materials and Methods 

Human subjects 

DNA samples were obtained from lymphocytes taken from 154 male British army recruits of 
Caucasian origin who were unselected for height. Height data were available for 124 of these 
individuals (mean, 1.76 ± 0.07 m) and the height distribution was found to be normal (Shapiro- 
Wilk statistic W=0.984, p=0.16). Ethical approval for these studies was obtained from the local 
Multi-Regional Ethics Committee. 

Polymerase chain reaction (PCR) amplification 

PCR amplification of a 3.2 kb GH1 gene-specific fragment was performed using 
oligonucleotide primers GH1F (5' GGGAGCCCCAGCAATGC 3'; -615 to -599) and GH1R (5' 
TGTAGGAAGTCTGGGGTGC 3'; 2598 to 2616) [numbering relative to the transcriptional 
initation site at+1 (GenBank Accession No. J03071)]. A 1.9kb fragment containing sites I and II 
of the GH1 LCR was PCR amplified with LCR5A (5' CCAAGTACCTCAGATGCAAGG 3'; - 
315 to -334) andLCR3.0 (5' CCTTAGATCTTGGCCTAGGCC 3'; 1589 to 1698) [LCR 
sequence was obtained from GenBank (Accession No. AC005803) whilst LCR numbering 
follows that of Jin et al. 1999; GenBank (Accession No. AF010280)]. Conditions for both 
reactions were identical; briefly, 200ng lymphocyte DNA was amplified using the Expand™ 
high fidelity system (Roche) using a hot start of 98°C 2 min, followed by 95°C 3 min, 30 cycles 
95°C 30 s, 64°C 30 s, 68°C 1 min. For the last 20 cycles, the elongation step at 68°C was 
increased by 5 s per cycle. This was followed by further incubation at 68°C for 7 min. 

Cloning and sequencing 



Initially, PCR products were sequenced directly without cloning. The proximal promoter, 
region of the GH1 gene was sequenced from the 3.2 kb Gffi-specific PCR fragment using 
primer GH1S1 (5' GTGGTCAGTGTTGGAACTGC 3': -556 to -537). The 1.9 kb GH1 LCR 
fragment was sequenced using primers LCR5.0 (5' CCTGTCACCTGAGGATGGG 3'; 993 to 
1011), LCR3.1 (5' TGTGTTGCCTGGACCCTG 3'; 1093 to 11 10), LCR3.2 (5' 
CAGGAGGCCTCACAAGCC 3'; 628 to 645) and LCR3.3 (5' ATGCATCAGGGCAATCGC 3'; 
211 to 228). Sequencing was performed using BigDye v2.0 (Applied Biosystems) and an ABI 
Prism 377 or 3100 DNA sequencer. In the case of heterozygotes for promoter region or LCR 
variants, the appropriate fragment was cloned into pGEM-T (Promega) prior to sequencing. 

Construction of lucif erase reporter gene expression vectors 

Individual examples of 40 different GH1 proximal promoter haplotypes (Table 1) were PCR 
amplified as 582 bp fragments with primers GHPROM5 (5' 

AGATCTGACCCAGGAGTCCTCAGC 3'; -520 to -501) and either GHPROM3A (5' 
AAGCTTGCAGCTAGGTGAGCTGTC 3'; 44 to 62) or GHPROM3C (5« 

AAGCTTGCCGCTAGGTGAGCTGTC 3'; 44 to 62) depending on the base at position +59 of the 
haplotype. To facilitate cloning, all primers had partial or complete non-templated restriction 
endonuclease recognition sequences added to their 5' ends (denoted in bold above); BgHL 
(GHPROM5) zadHindm (GHPROM3A and GHPROM3C). PCR fragments were then cloned into 
pGEM-T. Plasmid DNA was initially digested with HindBl (New England Biolabs) and the 5 1 
overhang removed with mung bean nuclease (New England Biolabs). The promoter fragment was 
released by digestion with BgUL (New England Biolabs) and gel purified. The luciferase reporter 
vector P GL3 Basic was prepared by Ncol (New England Biolabs) digestion and the 5' overhang 
removed with mung bean nuclease. The vector was then digested with BgUL (New England Biolabs) 
and gel purified. The restricted promoter fragments were cloned into luciferase reporter gene vector 



GL3 Basic. Plasmid DN^^KpGL3GH series) were isolated (Qiagen 





system) and 



sequenced using primers RV3 (5' CTAGCAAAATAGGCTGTCCC 3'; 4760 to 4779), GH1SEQ1 
(5 1 CCACTCAGGGTCCTGTG 3 1 ; 27 to 43), LUCSEQ1 (5' CTGGATCTACTGGTCTGC 3'; 683 
to 700) and LUCSEQ2 (5' GACGAACACTTCTTCATCG 3'; 1372 to 1390) to ensure that both the 



construct (-288 to +62) was also made by restriction of pGL3GHl (haplotype 1) with Ncol and 
BgUL followed by blunt-ending/religation to remove SNP sites 1-5. 

Artificial proximal promoter haplotype reporter gene constructs were made by site-directed 
mutagenesis (SDM) [Site-Directed Mutagenesis Kit (Stratagene)] to generate the predicted super- 
maximal haplotype (AGGGGTTAT-ATGGAG) and sub-minimal haplotypes (AG- 
TTGTGGGACCACT and AG-TTTTGGGGCCACT). 

To make the LCR-proximal promoter fusion constructs, the 1 .9 kb LCR fragment was 
restricted with BglR and the resulting 1 .6 kb fragment cloned into the BglTL site directly 
upstream of the 582 bp promoter fragment in pGL3. The three different LCR haplotypes were 
cloned in pGL3 Basic, 5* to one of three GH1 proximal promoter constructs containing 
respectively a "high expressing promoter haplotype" (H27), a "low expressing promoter 
haplotype" (H23) and a "normal expressing promoter haplotype" (HI) to yield a total of nine 
different LCBL-GH1 proximal promoter constructs (pGL3GHLCR). Plasmid DNAs were then 
isolated (Qiagen midiprep) and sequence checked using appropriate primers. 

Luciferase reporter gene assays 

In the absence of a human pituitary cell line expressing growth hormone, rat GC pituitary cells 
(Bancroft 1973; Bodner and Karin 1989) were selected for in vitro expression experiments. Rat 
GC cells were grown in DMEM containing 15% horse serum and 2.5% fetal calf serum. Human 
HeLa cells were grown in DMEM containing 5% fetal calf serum. Both cell lines were grown at 



GH1 promoter and luciferase gene sequences were correct A truncated GH1 proximal promoter 
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n^mediated transfection of GC cells and HeLlrTel 



37°C in 5% C0 2 . Liposome^mediated transfection of GC cells and HeLiTCells was performed 
using Tfx^O (Promega) in a 96-well plate format. Confluent cells were removed from culture 
flasks, diluted with fresh medium and plated out into 96-well plates so as to be -80% confluent 
by the following day. 

The transfection mixture contained serum-free medium, 250ng pGL3GH or pGL3GHLCR 
construct, 2ng pRL-CMV, and 0.5ul Tfx™-20 Reagent (Promega) in a total volume of 90ul per 
well. After 1 hr, 200ul complete medium was added to each well. Following transfection, the 
cells were incubated for 24 hrs at 37°C in 5% C0 2 before being lysed for the reporter assay. 

Luciferase assays were performed using the Dual Luciferase Reporter Assay System 
(Promega). Assays were performed on a microplate luminometer (Applied Biosystems) and then 
normalized with respect to Renilla activity. Each construct was analysed on three independent 
plates with six replicates per plate (i.e. a total of 18 independent measurements). For the 
proximal promoter assays, each plate included negative (promoterless pGL3 Basic) and positive 
(SV40 promoter-containing pGL3) controls. For the LCR analysis, constructs containing the 
proximal promoter but lacking the LCR were used as negative controls. 

Electrophoretic mobility shift assay (EMSA) 

EMS A was performed on double stranded oligonucleotides that together covered all 16 SNP 
sites (Table 2). Nuclear extracts from GC and HeLa cells were prepared as described by Berg et 
al. (1994). Oligonucleotides were radiolabelled with [y- 33 P]-dATP and detected by 
autoradiography after gel electrophoresis. EMSA reactions contained a final concentration of 
20mM Hepes pH7.9, 4% glycerol, ImM MgCl 2 , 0.5mM DTT, 50mM KC1, 1.2ug HeLa cell or 
GC cell nuclear extract, 0.4ug P oly[dI-dC].poly[dI-dC], 0.4pM radiolabelled oligonucleotide, 
40pM unlabelled competitor oligonucleotide (100-fold excess) where appropriate, in a final 



volume of lOpl. EMSA reactions were incubated on ice for 60 mins and electrophoresed on 4% 
PAGE gels at 100V for 45 mins prior to autoradiography. For each reaction, a double stranded 
unlabelled test oligonucleotide was used as a specific competitor whilst an oligonucleotide 
derived from the NF1 gene promoter (5' CCCCGGCCGTGGAAAGGATCCCAC 3') was used 
as a non-specific competitor. Double stranded oligonucleotides corresponding to the human 
prolactin (PRL) gene Pit-1 binding site (5' TCATTATATTCATGAAGAT 3') and the Pit-1 
consensus binding site (5' TGTCTTCCTGAATATGAATAAGAAATA 3') were used as 
specific competitors for protein binding to the SNP 8 site. 

Primer extension assays 

Primer extension assays were performed to confirm that constructs bearing different SNP 
haplotypes utilized identical transcriptional initiation sites. Primer extension followed the method 
of Triezenberg et al. (1992). 

Data normalization 

Expression measurements for negative controls (promoterless pGL3 Basic) exhibited 
considerable variation between plates (Figure la). To correct the data for base-line expression 
and plate effects, the mean activity of the negative controls on a given plate was subtracted from 
all other activity values on the same plate. The mean (plate-corrected) activity for proximal 
promoter haplotype 1 (HI) on each plate was then calculated, and all other haplotype-associated 
activities on the same plate were divided by this value. These two transformations ensured that 
the mean negative control activity equalled zero whilst the mean activity of HI equalled unity, 
independent of plate number. Resulting activity values may thus be interpreted as fold changes 
in comparison to HI, corrected for both baseline and plate effects. Since no significant plate 
effect was detectable after transformation, the data were combined over plates. The results of 
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this normalization procedure illustrated for HI in Figure lb. A procure similar to mat used 
for the analysis of the proximal promoter haplotypes was also followed for the LCR-promoter 
fusion construct expression data, using haplotype A as the reference haplotype.. 

Statistical analysis . 

Normalized expression levels of the proximal promoter haplotypes were tested for goodness- 
of-fit to a Gaussian distribution using the Shapiro-Wilk statistic (W) as implemented in 
procedure UNIVARIATE of the SAS statistical analysis software (SAS Institute Mc, Cary NC, 
USA). Significance assessment was adjusted for multiple (i.e. 40-fold) testing by setting 
PctitoI =0.05/40*0.001. Using this criterion, me expression levels of two promoter haplotypes 
were found to differ significantly from a Gaussian distribution viz. H21 (W=0.727, p=0.0002) 
and H40 (W=0.758, p=0.0004). For the other 38 haplotypes, expression levels were regarded as 
consistent with normality and were therefore subjected to pair-wise comparison using Tukey's 
studentized range test (SAS procedure GLM). Pair-wise comparison of expression levels 
between groups of different haplotypes was performed using normal approximation z of the 
Wilcoxon rank sum statistic (SAS procedure NPAR1WAY). 

The SNPs analysed in this study exerted their influence upon proximal promoter expression in 
a complex and highly interactive fashion. Further, owing to linkage disequiUbrium, expression 
levels associated with individual polymorphisms were found to be strongly interdependent. It 
was thus expected that a substantial proportion of the observed variation in expression level 
would be attributable to variation at a small subset of polymorphic sites. In order to assess 
formally the correlation structure between the SNPs, and to be able to identify an appropriate 
subset of critical polymorphisms for further study, the residual deviance upon haplotype 
partitioning was calculated for all possible subsets of proximal promoter SNPs. 



11 




For a given partitionin^l...m}==n===7i 1 u...U7c k of a set of data points x^...^, and with n(i)=j if 



When the data set was not partitioned at all, then 8=5(n o )==421.7, and the relative residual 
deviance of any other partitioning II was defined as 5 R (n^^(n)/5(no). 

Six SNPs (nos. 1, 6, 7, 9, 1 1 and 14; see below) were identified as being responsible for a 
sizeable proportion (—60%) of the residual deviance in expression level at the same time as 
invoking relatively little haplotype variation. The statistical interdependence of these SNPs was 
further analysed by means of a regression tree, constructed by recursive binary partitioning using 
statistics software R (Ihaka and Gentleman 1996). In the tree construction process, the SNPs 
were used individually as predictor variables at each node so as to select the two most 
homogeneous subgroups of haplotypes with respect to the response variable (i.e. normalized 
proximal promoter expression). The node and SNP that served to introduce a new split were 
chosen so as to minimize 5 R for the partitioning as defined by the terminating nodes ('leaves') of 
the resulting intermediate tree. This process was continued until all leaves corresponded to 
individual haplotypes ('fully grown tree*). The reliability of the 8 R estimates was assessed in 
each step by 10-fold cross-validation and the standard error (SE) was calculated. 

Regression analysis of height and proximal promoter expression level in vitro was performed 
for the 124 height-known individuals studied using the CANCORR procedure of the SAS 
software package. Let and fij,^ denote the mean normalized expression levels of the two 
haplotypes carried by a given individual. The height of individuals not homozygous for HI 
(n=109) was modelled as 



ie7ij, the residual deviance 8 of II is defined as 



^(ID^C*,-*^ 2 . 
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height = a 0 + 



«r o. 2 2 



and the coefficient of determination, r\ calculated. 

Areduced median network (Bandelt etal. 1995) was constructed for Ihe seven promoter 
haplotypes (HI - H7) that were observed at least 8 times in the 1 54 study individuals. 

Linkage disequilibrium analysis 

Linkage ^equilibrium (LD) between promoter SNPs, and between SNPs andLCR 
hap,otypes, was evaluated in 100 individual randomly chosen from me tola! of 154 under stedy, 
using parameter p as devised for bialleUc loci by Morton et al. (2001). Whilst p-1 is equivalent 
to two loci showing complete LD, p-0 indicates complete lack of LD. Only eight SNPs were 
found to be suffioiendy polymorphic in the population sample (heterozygosity >SN) to warrant 
inclusion SNPS was excluded owing to its perfect LD with SNP4 (only two pair-wise 
haplotypes present). Maximum likelihood estimates of me combined LCR-proximal promoter 
haplotype fiequencies, as required for LD analysis, were obtained using an in-house 
implementation of the expectation maximization (EM) algorithm. 
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Results 

Proximal promoter polymorphism frequencies and haplotypes 

The GUI gene promoter region has been reported to contain 16 polymorphic nucleotides 
within a 535 bp stretch (Table 3; Giordano etal. 1997; Wagner et al. 1997). These SNPs were 
enumerated 1-16 for ease of identification (Figure. 2). In a study of 154 male British 
Caucasians, 15 of these SNPs (all except no. 2) were found to be polymorphic (minor allele 
frequencies 0.003 to 0.41; Table 3). Variation at the 16 positions was ascribed to a total of 36 
differentpromoter haplotypes (Table 1). Haplotype 1 (HI) may thus be described by a 
sequence of 16 bases (GGGGGGT ATGAAG AAT) , representing me 16 SNP locations from 
-476 to +59. The frequency of Ihe 36 promoter haplotypes varied from 0.339 for HI, 
henceforth referred to as 'wild-type', to 0.0033 (nos. 25-36) (Table 1). A further 4 haplotypes 
(nosl 37-40) were found as part of a separate study in 4 individuals exhibiting short stature 
(Table 1). These haplotypes were absent from the study group but were included in the 
subsequent analysis for the sake of completeness. 

Proximal promoter haplotypes and relative promoter strength 
The 40 promoter haplotypes were studied by in vitro reporter gene assay and found to differ 

with respect to their ability to drive luciferase gene expression in rat pituitary cells (Table 4). 

Expression levels were found to vary over a 12-fold range with the lowest expressing 

haplotype (no. 17) exhibiting an average level that was 30% that of wild-type and the highest 
expressing haplotype (no. 27) exhibiting an average level that was 389% that of wild-type 
(Table 4). Twelve haplotypes (nos. 3, 4, 5, 7, 1 1, 13, 17, 19, 23, 24, 26 and 29) were 
associated with a significantly reduced level of luciferase reporter gene expression by 
comparison with HI. Conversely, a total of 10 haplotypes (nos. 14, 20, 27, 30, 34, 36, 37, 38, 
39 and 40) were associated with a significantly increased level of luciferase reporter gene 




expression by comparison with HI (Table 4). Constructs bearing different SNP haplotypes 
were shown by primer extension assay to utilize identical transcriptional initiation sites (data 
not shown). Expression of the reporter gene constructs was found to be 1000-fold lower in 
HeLa cells than in GC cells (data not shown). 

The in vitro expression levels of the 40 different GH1 promoter haplotypes are presented 
graphically in Figure 3. A tendency is apparent for the low expressing haplotypes to occur 
more frequently whereas the high expressing haplotypes tend to occur less frequently 
(Wilcoxon PO.01). Since this finding is suggestive of the action of selection, selection 
effects were sought at the level of individual SNPs. For the 15 SNPs studied here, the mean 
expression level (weighted by haplotype frequency) and the frequency of the rarer allele in 
controls were found to be positively correlated (Spearman rank correlation coefficient, r = 
0.32). If SNP 7 is excluded as an outlier (it has a particularly high expression level associated 
. with the rarer allele), r = 0.53 with a one sided p<0.05. 

The in vitro expression level associated with the truncated promoter construct lacking 
SNPs 1-5 was 102±5% that of the wild-type (haplotype 1). Thus it may be inferred that 
SNPs 1-5 are likely to have a limited direct influence on GH1 gene expression. 

Expression levels associated with individual SNPs were found to be strongly 
interdependent. An attempt was therefore made to partition the expression data in such 
a way as to identify a subset of key polymorphic sites that contribute disproportionately 
to the observed variation in in vitro expression level. Partitioning by the full haplotype 
comprising all 16 SNPs yielded a relative residual deviance of 8 R (II 16 )=0.245. This can 
be interpreted in terms of 24.5% of the variation in expression level not being 
accountable by variation in haplotype. For l^k<16, the miiiimum-5 R -partitioning U Kttl i n 
was defined as that haplotype partitioning with k SNPs that yielded the smallest relative 
residual deviance 5 R . The relationship between k and SrCI!^,,), together with the 



15 

number of haplotypes^nprising IIj^ is depicted in Figure 4. A^mfitative difference 
was evident between k=6 and k=7 in that the number of haplotypes associated with 
n,^ increases from 13 to 22 whilst ^(TI^ decreases only marginally 
[S^F^-O-W vs 5 R (n 7iniJll )=0.371]. It was therefore concluded that SNPs 1, 6, 7, 9, 
1 1 and 14, which define represented a good choice of key polymorphisms for 
further analysis. Of the remaining SNPs, six (nos. 3, 4, 8, 10, 12, and 16) could be 
classified as "marginally informative". These markers, in combination with the six key 
SNPs, together define 39 of the 40 haplotypes observed, and account for virtually all of 
the explicable deviance (5 R (TI 12fnrin )=0.245). The other four SNPs (nos. 2, 5, 13 and 15) 
were '^ininformative" with respect to the normalized in vitro expression level since they 
were either monomorphic in our sample (no. 2), or were in perfect (nos. 5 and 13) or 
near perfect (no. 1 5) linkage disequiUbrium with other markers. 

The correlation structure of the six key SNPs was next assessed using a series of 
successively growing (i.e. nested) regression trees. Following convention in regression 
tree analysis (Therneau and Atkinson 1997), the smallest intermediate tree with a cross- 
validated Or within one SE of that of the fully grown tree was chosen as a representative 
partitioning (Figure 5). This 'optimal' tree was found to comprise 10 internal and 11 
terminal nodes (Figure 6, Table 5). The relative residual deviance of the tree equals 
68=0.398, thereby accounting for (l-0.397)/(l-0.245) « 80% of the deviance explicable 
through haplotype partitioning. 

The single most important split was by SNP 7 which on its own accounted for 15% of 
the explicable deviance. The four haplotypes carrying the C allele of this SNP define a 
homogeneous subgroup (leaf 1 1) with a mean normalized expression level 1 .8 times 
higher than that of HI. Haplotypes carrying the T allele of SNP 7 were further sub- 
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divided by SNP 9, with^ele T of this.polym6rpbism causing highe^^>ression 
0^-1.26) than aUele G (^=0.84; Wilcoxon zr=7.09, p<0.001). The resulting nnTTnn 
haplotype was split by SNP 6 (G/T), with nGTTnn forming a terminal node (leaf 8) that 
includes the wild-type haplotype HI . Interestingly, the nTTTnn haplotypes, when sub- 
divided by SNP 1 1, manifested a dramatic difference in expression level. Whilst 
nTTTGn was found to be a low expresser Gw-0.64), haplotype nTTTAn exhibited 
maximum average expression (m, or =3.89; Wilcoxon z=5.1 1 , pO.OOl). 

Haplotype nnTGnn for SNPs 7 and 9 was sub-divided by SNPs 14 and 1, with three of 
the resulting haplotypes forming terminal nodes (leaves 1, 6 and 7). The fourth 
haplotype, GnTGnA, was an intermediate expresser (^=0.86) that was further split by 
SNPs 1 1 and 6. Interestingly, only one particular combination of SNP 14 and 1 alleles 
resulted in increased expression on the SNP 7 and 9 nnTGnn background (AnTGnG, 
leaf 7, m, OT =1.83). A similar non-additive effect upon expression was also noted for 
SNPs 6 and 1 1 when considered on haplotype GnTGnA: whereas SNP 1 1 allele A was 
associated with higher expression than G in combination with SNP 6 allele T 
(GTTGAA ^=1.18 vs GTTGGA ^=0.74; Wilcoxon z=7.09, p<0.001), the opposite 
held true in combination with SNP 6 allele G (GGTGAA iw=0.74 vs GGTGGA 
0^=1 .04; Wilcoxon z^5.28, p<0.00 1). . 

Evolution ofhaplotype diversity 

Of the 15 GH1 gene promoter SNPs found to be polymorphic in this study, alternative 
alleles at 14 positions were potentially explicable by gene conversion since they were 
identical to those in analogous locations in at least one of the four paralogous human 
genes (Table 3). Comparison with the orthologous growth hormone (GH) gene 
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promoter sequences of 10 other mammals revealed that the most frequent alleles at 
nucleotide positions -75, -57, -31, -6, +3, +16 and +25 (corresponding to SNPs 8-15 
inclusive) in the human GH1 gene were strictly conserved during mammalian evolution 
(Krawczak et al. 1999). Intriguingly, the rarest of the three alternative alleles at the -1 
position (SNP 12) in the human GH1 gene was identical to that strictly conserved in the 

mammalian orthologues. 

A 'Reduced Median Network' (Figure 7) revealed that wild-type haplotype HI is not 
directly connected to other frequent haplotypes by single mutational events. The second 
most common haplotype, H2, is connected to HI via H23 and H12 whilst the third most 
common haplotype, H3, is connected to HI either through a non-observed haplotype or 
a double mutation. Expansion of this network so as to incorporate further haplotypes 
was deemed unreliable owing to the small number of observations per haplotype. 
Furthermore, expansion of the network would have entailed the introduction of multiple 
single base-pair substitutions. Since these cannot be distinguished from serial rounds of 
gene conversion between pre-existing haplotypes, the resulting distances in the network 
would have been unlikely to reflect genuine evolutionary relationships. However, this 
may safely be assumed to be the case for the network depicted in Figure 7 that connects 
the seven most frequent haplotypes, since each mutation occurs only once. 

A general decline of linkage disequUibrium (LD) with physical distance was noted 
for most SNPs, with some notable exceptions (Table 6). Thus, SNP 9 was. found to be in 
strong LD with the other SNPs, including SNP 16 which showed comparatively weak 
LD with all other proximal promoter SNPs. This finding suggests that the origin of SNP 

9 was relatively late. However, SNP 10 was found to be in perfect LD with SNP 12 but 
not SNP 1 1 (p=0.381), whereas SNP 8 was in stronger LD with SNP 1 1 than with SNP 

10 (p=0.925 vs 0.687). These anomalous findings suggest that the extant pattern of LD 
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among the proximal promoter SNPs is unlikely to have arisen solely^nrough 
recombinational decay with distance, but rather is likely to reflect the action of other 
mechanisms such as recurrent mutation, gene conversion or selection. 

Prediction and Junctional testing of super-maximal and sub-minimal haplotypes 
Based upon the 'optimal' regression tree obtained for the haplotype-dependent proximal 
promoter expression data, an attempt was made to predict potential "super-maximal" 
and "sub-niinimal" haplotypes in terms of their levels of expression. To this end, alleles 
of the six key SNPs were chosen taking the mean expression levels of the appropriate 
leaves of the tree into account (Table 5). Alleles of the remaining SNPs were 
determined so as to respectively maximize or minimize expression of individual SNPs. 
Thus, for the predicted super-maximal haplotype, alleles of SNPs 6, 7, 9 and 11 were as 
in leaf 10 whilst alleles of SNPs 1 and 14 were as in leaf 7. The sub-minimal haplotype 
was chosen to represent leaf 1 (for SNPs 1, 7, 9 and 14). The best choice of alleles for 
SNPs 6 and 1 1 was however somewhat ambiguous since leaves 2 (suggesting alleles T 
and G) and 4 (suggesting alleles G and A) predicted similarly low mean expression 
levels. Therefore, it was decided to generate both constructs for in vitro testing. 
Completion of the hypothetical haplotypes for the remaining SNPs yielded 
super-maximal haplotype AGGGGTTAT-ATGGAG and 
sub-minimal haplotypes AG-TTGTGGGACCACT, AG-TTTTGGGGCCACT. 
These three artificial haplotypes were then constructed and expressed in rat pituitary 
cells yielding respectively expression levels of 145±4, 55±S and 20±8% in comparison 
to wild-type (haplotype 1). 



Differences between SNP alleles revealed by mobility shift (EMSA) assay 
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EMS As were performed at all proximal promoter SNP sites for all allelic variants using 
rat pituitary cells as a source of nuclear protein. Protein interacting bands were noted at 
sites -168, -75, -57, -31, -6/-1/+3 and+16/+25 (Table 7). Inter-allelic differences in the 
number of protein interacting bands were noted for sites -75 (SNP 8), -57 (SNP 9), -31 
(SNP 10), -6/-1/+3 (SNPs 11, 12, 13) and +16/+25 (SNPs 14, 15) [Figure 8; Table 7]. In 
the case of the latter two sites, EMS A assays on specific SNP allele combinations 
suggested that differential protein binding was attributable to allelic variation at SNP 
sites 12 and 15 respectively (Table 7). When the analysis was repeated using a HeLa 
cell extract, only position -57 manifested evidence of a protein interaction and then only 
for the G allele, not the T allele (data not shown). The results of competition 
experiments utilizing oligonucleotides corresponding to two distinct Pit-1 binding sites 
were consistent with one of the two SNP 8 interacting proteins being Pit-1 (Figure 8). 
However, the allele-specific protein interaction remained unaffected implying that the 
other protein involved was not Pit-1. 

Association between promoter haplotype expression in vitro and stature in vivo 
An attempt was made to correlate the haplotype-specific in vitro expression of the GH1 
proximal promoter with adult height in 124 male Caucasians. Each haplotype was 
ascribed its mean expression value from normalized in vitro expression data (Table 4) 
and the average A^lWi+H*^ 72 of ^° allotypes was calculated for each 
individual. Individuals homozygous for HI were excluded from the analysis since their 
A* values (1.0) would not have contributed any causal variation. This yielded a sample 
of 109 height-known individuals with suitable genotypes (Table 8). When height above 
and below the median (1 .765 m) was compared to A, values above and below the 
median (0.9), evidence for an association between height and GH1 proximal promoter 
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haplotype-associated in vitro expression emerged (x 2== 4-846, 1 d.f., P=0.028). This 
notwithstanding, regression analysis using a 2 nd degree polynomial demonstrated that 
the two ji^ values were on their own relatively poor predictors of height. Since the 
coefficient of determination was rMJ.025, it may be concluded that approximately 2.5% 
of the variance in body height is accounted for by reference to GH1 gene proximal 
promoter haplotype expression in vitro. 

Locus control region (LCR) polymorphisms and proximal promoter strength 
Three novel polymorphic changes were found within sites I and II (required for the 
pituitary-specific expression of the GH1 gene; Jin et al. 1999) of the GH1 LCR in a 
screen of 100 individuals randomly chosen from the study group. These were located at 
nucleotide positions 990 (G/A; 0.90/0.10), 1144 (A/C; 0.65/0.35) and 1194 (C/T; 
0.65/0.35) [numbering after Jin et al. 1999]. The polymorphisms at 1 144 and 1 194 were 
in total linkage disequilibrium, and three different haplotypes were observed: haplotype 
A(990G, 1144A, 1194C; 0.55), haplotype B (990G, 1144C, 1194T; 0.35) and 
haplotype C (990A, 1 144A, 1 194C; 0.10). 

In order to determine whether the Ihree LCR haplotypes exert a differential effect on 
the expression of the downstream GH1 gene, a number of different LCR-GH1 proximal 
promoter constructs were made. The three alternative 1.6 kb LCR-containing fragments 
were cloned into pGL3, directly upstream of three distinct types of proximal promoter 
haplotype, viz. a "high expressing promoter" (H27), a "low expressing promoter" (H23) 
and a "normal expressing promoter" (HI), to yield nine different "LCRrGHJ proximal 
promoter constructs in all. These constructs were then expressed in both rat GC cells 
and HeLa cells, and the resulting luciferase activities measured. In GC cells, the 
presence of the LCR enhances expression up to 2.8-fold as compared to the proximal 
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promoter alone (Table 9). However, the extent of this inductive effect was dependent 
upon die linked promoter haplotype. Two-way analysis of variance (Table 10) revealed 
that both main effects and the promoter*LCR interaction were significant, with the 
major influence exerted by the proximal promoter. Also included in Table 9 are the 
results of a Tukey studentized range test at 95% significance level, performed 
individuaUy for each promoter haplotype. In conjunction with promoter haplotype 1, Ihe 
activity of LCR haplotype A is significantly different from that of N (construct 
containing proximal promoter but lacking LCR), but not from that of LCR haplotypes B 
and C; LCR haplotypes B and C differ significantly from each other and from N. With 
promoter 27, however, no significant difference was found between LCR haplotypes. 
No LCR-mediated induction of expression was noted with any of the proximal promoter 
haplotypes in HeLa cells (data not shown). 

Since the physical distance between Ihe LCR and the proximal promoter SNPs was 
too great to permit joint physical haplotyping, the linkage disequilbrium (LD) between 
them was assessed by maximum likelihood methods using genotype data from the 100 
individuals included in the analysis of inter-SNP LD for the proximal promoter. Pair- 
wise LD between promoter SNPs and LCR haplotypes was found to be high for all 
SNPs except SNP 16 (Table 6). It may therefore be concluded that SNP 16 was subject 
to recurrent mutation prior to the genesis of SNP 9, the only SNP found to be in strong 
linkage disequiUbrium with SNP 16. Substantial differences between LCR haplotypes 
exist in terms of their LD with SNPs 4, 8 and 16 (Table 6), suggesting a relatively 
young age for LCR haplotype B as opposed to haplotype A. 
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Discussion 

Evidence that genetic factors play a major role in dete rmining stature comes from a 
variety of sources: from intra-familial resemblance (Preece 1996) and twin study-based 
heritability estimates (Chatterjee et al. 1999) to genome-wide linkage analyses 
(Hirschhorn et al. 2001). One practical consequence of the high degree of heritability is 
that the prediction of a child's target height must take parental height into consideration 
(Luo et al. 1998). Familial short stature has already been shown to be associated with 
inherited mutations of the growth hormone (GHZ) gene (Procter et al. 1998). Since the 
degree of polymorphism exhibited by the GH1 gene proximal promoter region (16 
SNPs in 535 bp) is extremely high, some 30-fold higher than the average for genomic 
DNA (Brookes 1999; Patil et al. 2001), it appeared worthwhile to explore the 
proposition that polymorphic variation in this promoter might influence adult height. 

In our study population, variation occurred at 15 of the 16 SNP locations and 
manifested itself in a total of 40 different promoter haplotypes. Twelve haplotypes were 
found to be associated with a significantly reduced level of luciferase reporter gene 
expression by comparison with haplotype 1, whereas 10 haplotypes were associated 
with a significantly increased level. The inverse relationship noted between GHZ 
haplotype expression level in vitro and population prevalence is intriguing. It may be 
that natural selection has acted so as to increase the frequency of low expressing 
haplotypes for reasons quite unrelated to stature e.g. resistance to infection (Saito et al. 
1996), starvation (Collins 1995) or trauma (Maison et al. 1998; Takala et al. 1999). 

The association noted between in vitro promoter haplotype expression and adult 
height is also remarkable, particularly since expression values were derived from an 
experimental system that employed a heterologous (rat) pituitary cell line and artificial 
promoter constructs that lacked the LCR. It follows that our estimate of the variance in 
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adult height attributaT^^ polymorphic variation in the GH1 gene promoter (2.5%) is 
likely to be conservative, and should therefore be regarded as a minimum. Indeed, since 
the influence of polymorphic variation within the coding region, introns and 3' flanking 
region of the GH1 gene (Low et al. 1989; Zhang et al. 1992; Kolb et al. 1998) have not 
been measured in this analysis, the influence of GH1 gene variation on both GH1 gene 
expression and adult height could have been underestimated Although GH is a major 
regulator of human post-natal growth, the hypothalamic/pituitary GH axis includes (or 
is influenced by) many other factors encoded by multiple genes (e.g. GHR, POU1F1, 
SHOX IGF1, LHX3, GHRH and GHRHR) [reviewed by Pfaffle et al. 2000]. It is thus 
reasonable to suppose that these genes may also harbour genetic variants that contribute 
to the variance of human stature. 

From the haplotype frequencies observed in our study group, it is predicted that 
some 8.2% of the normal population possess two low expressing GH1 proximal 
promoter haplotypes (either identical or non-identical) that are associated with in vitro 
GH production <55% that of the wild-type. It remains to be seen whether such 
haplotype combinations occur disproportionately in individuals of short stature; their 
possession could increase the likelihood that such individuals would come to clinical 
attention. 

Various cfc-acting regulatory sequences have been identified in the proximal 
promoter region of the human GH1 gene. These sequences include binding sites for 
NF1 (-286 to -274; Courtois et al. 1990), Spl (-136 to -127; Lemaigre et al. 1989a), the 
pituitary-specific transcription factor, Pit-1 (-132 to -107, -92 to -67; Lemaigre et al. 
1989b), the vitamin D receptor (VDRE; -60 to -46, -37 to -31; Alonso et al. 1998; 
Seoane et al. 2002) and CREB, a protein that interacts with cAMP-responsive elements 
(-188 to -184, -100 to -96; Shepard et al, 1994). Some of these factors may exert their 
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effects synergistically whereas others appear to bind to promoter motifs in a mutually 
exclusive fashion. Inspection of the GH1 gene promoter region suggests that some of 
the 15 SNPs are located within transcription factor binding sites (Figure 2). Thus, three 
SNPs cluster around the transcriptional initiation site (SNPs 11-13), one occurs at the 3' 
end of the proximal VDRE adjacent to the TATA box (SNP 10), one within the distal 
VDRE (SNP 9), one within the proximal Pit-1 binding site (SNP 8) and one within an 
NF1 binding site (SNP 6). Expression analysis of a truncated promoter construct was 
consistent with the limited influence of SNPs 1-5 on GH1 gene expression. Intriguingly, 
the nucleotide positions corresponding to SNPs 8 to 15 are strictly conserved in other 
mammals, a finding which while compatible with their candidacy as functional 
polymorphisms, nevertheless represents a caveat for the interpretation of phylogenetic 
foolprinting studies (Krawczak et al. 1999). Partitioning of the haplotypes identified six 
SNPs (nos. 1, 6, 7, 9, 1 1 and 14) as major determinants of GH1 gene expression level, 
with a further six SNPs being marginally informative (nos. 3, 4, 8, 10, 12 and 16). The 
functional significance of all 16 SNPs was investigated by EMSA assays which 
indicated that six polymorphic sites in the GH1 proximal promoter interact with nucleic 
acid binding proteins; for 5 of these sites [-75 (SNP 8), -57 (SNP 9), -31 (SNP 10), -1 
(SNP 12) and +25 (SNP 15)], alternative alleles exhibited differential protein binding. 

Despite the evident non-additivity of the effects of individual SNP alleles on GH1 
gene expression, an attempt was made to predict potential super-maximal and sub- 
minimal haplotypes" in terms of their expression levels. When tested, one of the sub- 
rninimal haplotypes did indeed manifest a lower level of expression than any naturally 
ocxmrring haplotype, a result which indicates the efficacy of the process of haplotype 
partitioning. However, with the other two artificial haplotypes tested, success in 
• obtaining predicted levels of expression was only partial. Thus although certain key 
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SNPs are identifiable as exerting a disproportionate effect on the expression level 
associated with a particular haplotype, the expression associated with novel SNP 
combinations is not entirely predictable. It follows that prompter haplotypes should 
perhaps to be considered in Gestalt terms rather than simply as sums of their component 
parts. 

The molecular basis for haplotype-dependent differences in GH1 gene promoter 
strength may thus lie in the net effect of the differential binding of multiple 
transcription factors to alternative versions of their Cognate binding sites. The 
alternative versions of these sites differ by virtue of their containing different alleles of 
the various SNPs that combinatorially constitute the observed array of promoter 
haplotypes. The transcriptional activation of human genes is mediated by the interaction 
of transcription factors with different combinations and permutations of their cognate 
binding sites on the gene promoter. Some transcription factors are coordinated directly 
by cw-acting DNA sequence motifs, others indirectly by protein-protein interactions in 
what has been likened to a three-dimensional jigsaw puzzle: the DNA sequence motifs 
providing the puzzle template, the transcription factors constituting the puzzle pieces. 
This modular view of the promoter helps one to envisage how the effect of different 
SNP combinations in a given haplotype might be transduced so as to exert differential 
effects on transcription factor binding, transcriptosome assembly and hence gene 
expression. Thus, for example, the observed non-additive effects oiGHl promoter 
SNPs on gene expression may be understood in terms of toe allele-specific differential 
binding of a given protein at one SNP site affecting in turn the binding of a second 
protein at another SNP site that is itself subject to allele-specific protein binding. 

This study represents the first direct evidence for the existence of functional 
polymorphisms in the GH1 gene. It has previously been claimed that GH1 SNPs at both 
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-278 (P-2) and 1 169 in intron 4 (P-l) are associated with both height and GH secretion 
after provocative testing (Hasegawa et al. 2000). However, no evidence for any direct 
effect of the -278 SNP was presented in this study, and the association with P-l may 
have been due to linkage disequilibrium. Although the intronic polymorphism has 
recently been reported to be associated with colorectal neoplasia (Le Marchand et al. 
2002), no association was apparent between P-l and any of the promoter haplotypes 
reported here.The LCR upstream of the GH gene cluster contains sequence elements 
that possess enhancer activity, confer tissue specificity of expression, and promote long 
range gene activation through the spreading of histone acetylation (Shewchuk et al. 
1999; Su et al. 2000; Shewchuk et al. 2001; Ho et al. 2002). The somatotrope-specific 
determinants of the LCR are present within a 1.6 kb region (sites I and II) -14.5 kb 
upstream of the GH1 gene (Shewchuk et al. 1999). In our own system, the introduction 
of this 1.6 kb LCR fragment served to enhance the activity of the GH1 proximal 
promoter by up to 2.8-fold, although the degree of enhancement was found to be 
dependent upon the identity of the linked proximal promoter haplotype. Conversely, 
enhancement of the activity of a proximal promoter of given haplotype was also found 
to be dependent upon the identity of the LCR haplotype. Taken together, these findings 
imply that the genetic basis of inter-individual differences in GH1 gene expression is 
likely to be extremely complex. In this regard, the results are also reminiscent of the p- 
globin LCR in which SNPs have been previously reported in the HS2 and HS4 regions 
(Perichon et al. 1993; Kukreti et al. 2002), with different alleles of the HS2 SNP 
conferring different levels of enhancement upon the expression of a y-globin promoter- 
linked reporter gene (Ofori-Acquah et al. 2001). 

Promoter polymorphisms affecting human gene expression are not infrequent and an 
increasing number have been characterized by functional studies e.g. those in the 
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plasminogen activator inhibitor type 1 (PAI1; Dawson et al. 1993), tumour necrosis 
factor a (23VF; Wilson et al. 1997), apolipoprotein AI (APOA1; Angotti et al. 1994) and 
lipoprotein lipase (LPL; Hall et al. 1997) genes. The combinatorial effects of SNPs in 
promoter regions are evident from studies of disease-associated polymorphism 
haplotypes. One such example is provided by the PDGFRA gene promoter in which the 
different haplotypes differ in terms of their ability to drive reporter gene expression in 
vitro and are differentially associated with the risk of a neural tube defect (Joosten et al. 

2001) . Under the assumption of additivity, attempts have sometimes been made to.tease 
out the net effects of individual SNPs by combinatorial functional assay. Our study has 
however served to demonstrate that SNPs within a promoter haplotype exert their 
influence on gene expression in a highly complex and interactive fashion. Such non- 
additive effects are not without precedent, having been reported before in the 
paraoxonase 1 (PON1), interleukin 6 QL6) and p2-adrenergic receptor (ADRB2) gene 
promoter regions (Terry et al. 2000; Brophy et al. 2001; Drysdale et al. 2000) albeit 
with much smaller numbers of SNPs. If, as appears increasingly likely (Tiret et al. 

2002) , such complexity were to be a common feature of gene promoters, it would not 
bode well for the success of conventional DNA polymorphism-disease association 
studies. Indeed, in cases where non-additivity of individual SNPs pertained, haplotype 
analysis would offer certain advantages (Bader 2001; Judson and Stephens 2001). The 
approach described here nevertheless represents a first attempt to partition into its 
constituent components the effect on gene expression of a complex promoter haplotype 
whilst concurrently exploring the interactions between those components. 
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Table 1. GH1 proximal promoter haplotypes defined by genetic variation at 16 locations 
No. SNP position relative to GH1 gene transcriptional start site n 
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n: frequency in 154 male British Caucasians; §: haplotypes exhibiting a significantly reduced 
level (55% that of haplotype 1) of luciferase activity in GC cells; $: only found in solitary 
cases of GH deficiency. 
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Table 2 Double-stranded oligonucleotide primer sequences for EMS A analysis of SNP sites 
e^bitmgaUele-specMcprotembmding. SNP s^ 11 - 15 were studied m different allele 
combinations. TSS: transcriptional initiation site. 



SNP/allele Position from Sequence 5'-»3' 

xss - 

77 gQ->-61 CCATGCATAAATGTACACAGAAACAGGTG 

8 A .89-^-61 CACCTGTTTCTGTGTACATTTATGCATGG 

„ 0 CCATGCATAAATGTGCACAGAAACAGGTG 

CACCrGTTTCTGTGCACATTTATGCATGG 

CAGAAACAGGTGGGGGCAACAGTGGGAGAGA 
TCTCTCCCACTGTTGCCCCCACCTGTTTCTG 

CAGAAACAGGTGGGGTCAACAGTGGGAGAGA 
TCTCTCCCACTGTTGACCCCACCTGTTTCTG 

GAGAAGGGGCCAGGGTATAAAAAGGGCCCAC 
GTGGGCCCnTTTATACCCTGGCCCCTTCTC 

, n Af , GAGAAGGGGCCAGGTATAAAAAGGGCCCAC 
10 AG GTGGGCCCTTTTTATACCTGGCCCCTTCTC 



9 G -72 -> -42 
9T 

10 G -45 -15 



11,12,13 -18-»+15 
A AG . 

11, 12, 13 
GAG 

11, 12, 13 
GTG 

14,15 +4-* +37 

AA 

14, 15 
GC 

14, 15 
GA 



CCACAAGAGACCAGCTCAAGGATCCCAAGGCCC 
GGGCCTTGGGATCCTTGAGCTGGTCTCTTGTGG 

CCACAAGAGACCGGCTCAAGGATCCCAAGGCCC 
GGGCCTTGGGATCCTTGAGCCGGTCTCTTGTGG 

CCACAAGAGACCGGCTCTAGGATCCCAAGGCCC 
GGGCCTTGGGATCCTAGAGCCGGTCTCTTGTGG 

ATCCCAAGGCCCAACTCCCCGAACCACTCAGGGT 
ACCCTGAGTGGTTCGGGGAGTTGGGCCTTGGGAT 

ATCCCAAGGCCCGACTCCCCGCACCACTCAGGGT 
ACCCTGAGTGGTGCGGGGAGTCGGGCCTTGGGAT 

ATCCCAAGGCCCGACTCGCCGAACCACTCAGGGT 
ACCCTGAGTGGTTCGGGGAGTCGGGCCTTGGGAT 

ATCCCAAGGCCCAACTCCCCGCACCACTCAGGGT 
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Table 3: Allele frequencies of 15 SNPs in the GH1 gene promoter of 154 male Caucasians 
and corresponding nucleotides in analogous locations of the paralogous genes of the GH 
cluster 
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type sequences of the four paralogous genes in the human GH cluster. 
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Table 4 In vitro GH1 gene promoter expression analysis of 40 different SNP haplotyp* 



Haplotype No, 
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n: number of measurements; mean normalized expression level (i.e. fold change 
compared to HI); a nor : standard deviation of expression level; Tukey: result of Tukey' 
studentized range test, haplotypes with overlapping sets of letters are not statistically 
different in terms of their mean expression level; *: non-Gaussian distribution 
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Table 5 Haplotype partitioning of GH1 gene promoter expression data 



Haplotype 5 


leaf* 




n 




cr nor 
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nnCnnn 
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32.16 
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3 


5 


90 


1.035 


0.493 


21.66 


GTTGAA 


5 


4 


72 


1.178 


0.384 


10.47 



number of haplotypes included in leaf; u^: mean normalized expression level; a nor : standard 
deviation of expression level; 8(leaf): residual deviance within leaf; §: alleles are given in the order 
of SNP 1, 6, 7, 9, 11 and 14 (n: any base); &: numbering as in Figure 6. 
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Table 6 Linkage disequilibrium, p, between GH1 proximal promoter SNPs and LCR haplotypes in 
100 male Caucasians 
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1 AAft 
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16 
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LCR* 


4 


6 


8 


9 
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0.958 
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0.491 


0.840 



SNP 

10 11 12* 16 
0.731 0.554 0.638 0.567 



0.632 0.891 0.867 0.111 

0.687 . 0.925 0.242 0.251 

1.000 0.905 1.000 1.000 

-.- 0.381 1.000 0.415 

0.381 1.000 0.044 

1.000 1.000 -.- 0.025 

0.415 0.044 0.025 

10 11 12 16 

0.601 0.782 0.800 0.064 

0.531 0.873 0.831 0.643 

0.875 0.482 1.000 0.289 



&: a single cnromosome out ui iuu w<» muuu w wui; »- » 

excluded from all LD analyses involving SNP12; $: for each LCR haplotype, p was calculated 
against the combination of the other two LCR haplotypes, thereby turning the LCR into a biallehc 
system. 




Table 7. Results of EMSA assays that demonstrated aUele-specific differential protein binding 
at the various SNP sites in the GH1 gene promoter using rat pituitary cell nuclear extracts. 



SNP Position of double- Sequence No* of protein interacting bands Transcription factor 
stranded variation Strong Medium Weak binding site/ 

oligonucleotide functional region 



8 


»89 -» -61 


-75 A 


- 


1 


Pit-1 






-75 G 


1 


1 


Pit-1 


o 






i 
i 




- v ltarain u receptor 






-57 G 


2 


-- 


Vitamin D receptor 


10 


-45-7-15 


-31 G 


1 


- 


TATA box 












1 TATA Kn-v 
1 1A1A DOX 


11,12,13 


-18-* +15 


-6/-1/+3 


- 




TSS 






A An 












-6/-1/+3 


- 




TSS 






GAG 












-6/-1/+3 


1 




TSS 






GTG 








14,15 


+4->+37 


+16/+25 


2 


1 


5'UTR 






AA 












+16/+25 


2 




5'UTR 






AC 












+16/+25 


1 




5'UTR 






GC 












+16/+25 


2 


1 


5'UTR 






GA 









TSS: Transcriptional start site 5*UTR: 5* untranslated region 
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Table 8 Association between adult height and GHZ proximal promoter haplotype-associated 
vitro expression data in 124 male Caucasians 





A„<0.9 


A^>0.9 


height<1.765 


34 


22 


height>1.765 


21 . 


32 



A*: average normalized in vitro expression level of the two haplotypes of an individual i.e. 
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Table 9 Average GC cell-derived, normalized luciferase activities ± standard deviation of different 
LCRrGHl proximal promoter constructs 



Promoter haplotype 


N 


LCRhaplotype 
A B 


C 


HI 


1.0010.26* 


2.47±0.41 yz 


2.30±0.46 y 


2.77±0.55 z 


H23 


1.00±0.14 x 




2.14±0.52 z 


1.35±0.48 xy 


H27 


l.OfcfcO^fi* 


1.1 1±0.36* 


1.00±0.41 x 


1.2510.27" 



x,y,z: lmcey s suiaenuzeu range icsi wiuluu a piumuwi ua^i^i.jrj^w, j-^*^ ^pw;^ v *», ~ 
and C) with overlapping sets of letters are not statistically different in terms of their mean 
expression level. N: Construct containing proximal promoter but lacking LCR* LCR 
haplotypes were normalised with respect to N in each case. 
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Table 10 Two-way ANOVA of nonnalized luciferase activities of LCR.-GH1 proximal promoter 
constructs 



Source 


DF 


Mean Square 


F Value 


Pr>F 


Promoter haplotype 
LCRhaplotype 
Interaction 


2 
3 
6 


51.46 
5.67 
3.09 


390.97 
43.08 
23.48 


<.0001 
<0001 
<.0001 




FIGURE LEGENDS 

Figure 1: GH1 gene promoter expression of negative controls as measured on different 
plates (a), and normalized expression levels of the wild-type haplotype (1), displayed as 
multiples of the plate-wise mean expression level of the wild-type (b). 

Figure 2. Location of 16 SNPs in the GH1 promoter relative to the transcriptional start site 
(denoted by an arrow). The hatched box represents exon 1. The positions of the binding sites 
for transcription factors, nuclear factor 1 (NF1), Pit-1 and vitamin D receptor (VDRE), the 
TATA box and the translational initiation codon (ATG) are also shown. 

Figure 3: Normalized expression levels of the 40 GH1 haplotypes relative to the wild-type 
(haplotype 1). Haplotypes associated with a significantly reduced level of luciferase reporter 
gene expression (by comparison with haplotype 1) are denoted by hatched bars. Haplotypes 
associated with a significantly increased level of luciferase reporter gene expression (by 
comparison with haplotype 1) are denoted by solid bars. Haplotypes are arranged in 
decreasing order of prevalence. 

Figure 4: Minimum relative residual deviance 5 R (n kflniI1 ) of normalized expression levels 
associated with haplotype partitioning using k SNPs (shaded bars). The dotted curve depicts 
the number of haplotypes comprising the minimum-8 R -partitioning 

Figure 5: Relationship between size and cross-validated 5 R value for minimum deviance 
intennediate trees, using six selected SNPs (nos. 1, 6, 7, 9, 1 1 and 14). The dotted (horizontal) 
line corresponds to one SE of the cross- validated 8 R of the fully grown tree; the dashed 



49 



(vertical) line indicates the smallest tree for which the cross-validated 5r lies within one SE of 
that of the fully grown tree. 

Figure 6: Regression tree of GH1 gene promoter expression as obtained by recursive binary 
haplotype partitioning, using six selected SNPs (nos. 1, 6, 7, 9, 1 1 and 14). Numbers oh nodes 
refer to the SNPs by which the respective nodes were split Terminal nodes ('leaves') are 
depicted as squares and numbered from left to right. 

Figure 7: 'Reduced Median Network' connecting the seven haplotypes (circles) that have 
been observed at least 8 times in 154 male Caucasians. The size of each circle is proportional 
to. the- frequency of the respective haplotype in the control sample. Haplotypes H12 and H23 
have been included as connecting nodes even although they have been observed only 5 and 2 
times, respectively. SNPs at which haplotypes differ are given alongside each branch. The 
dark dot marks a non-observed haplotype or a double mutation at SNP sites 4 and 5. 

Figure 8: Differences in protein binding capacity between GH1 promoter SNP alleles 
revealed by electrophoretic mobility shift (EMSA) assays. Arrows denote allele-specific 
interacting proteins. The arrowhead denotes the position of a Pit-1 -like binding protein, -ve 
(negative control), +ve (positive control), S (specific competitor), N (non-specific 
competitor), P (Pit-1 consensus sequence), P* (prolactin gene Pit-1 binding site), TSS 
(transcriptional initiation site). 
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Statement of the Invention 



We describe herein a method of haplotype partitioning to identify 
mutations and/or polymorphisms that are major determinants of phenotype, 
particularly, but not exclusively, phenotype that is either advantageous or 
disadvantageous. For example, perhaps most typically, the method will be used 
to identify mutations and/or polymorphisms that are responsible, wholly or in 
part, for a physiological condition or disorder, such as, for example, a disease or 
abnormal or undesirable state. 

Accordingly, the method of haplotype partitioning of the invention 
comprises examining the residual deviance (S) for each mutation and/or 
polymorphism of a gene under consideration. 

More ideally the method comprises examining the residual deviance {S) 
of possible subsets of mutations and/or polymorphisms and so, most 
advantageously, the method is undertaken to examine the residual deviance 
(6), upon haplotype partitioning {1...m>; of each possible subset of mutations 
and/or polymorphisms. 

Most ideally still the method involves using the following function 

(X.-Xno)) 2 

(See page 1 1 for definitions) 
The method of the invention is thought to be particularly, but not exclusively, 
suited to situations where the effects of said mutations and/or polymorphisms 
are strongly interdependent such as, for example, in the instance where there is 
linkage disequilibrium. 
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Using this methodology it is possible to identify those mutations and/or 
polymorphisms that are . responsible for a sizeable proportion of the residual 
deviance in, for example, expression levels - where the mutations and/or 
polymorphisms are in the promoter region of the gene or, for example, protein 
function - where the mutations and/or polymorphisms are in the protein coding 
sequence of the gene. 

Advantageously the methodology of the invention can be used to predict, 
and so subsequently make, super-maximal and sub-minimal haplotypes which 
may be useful, for example, as experiment controls in subsequent testing 
programmes. 

Other methods for the identification of mutations and/or polymorphisms 
responsible for a sizeable proportion of the phenotype under consideration are 
described herein and constitute various aspects and/or embodiments of the 
invention. 

According to a further aspect of the invention there is described herein 
significant mutations and/or polymorphisms, in the form of single nucleotide 
polymorphisms (SNPs), that are major determinants of the phenotype height. 

More specifically, these SNPs, which are located in the proximal promoter 
of the growth hormone gene (GH1), determine the level of expression of growth 
hormone and so the likely height of an individual. 

It follows that knowledge of these SNPs or this subset of SNPs has utility 

in diagnostic techniques. 

According to a further aspect of the invention there is provided a detection 
method for detecting a variation in GH1 effective to act as an indicator of growth 
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hormone dysfunction in an individual, which detection method comprises the 
steps of: 

(a) obtaining a test sample of genetic material from an individual to be 
tested, said material comprising, at least, a human GH1 gene or a 
fragment thereof; and 

(b) determining, and then, analysing the nucleotide sequence of said 
GH1 gene, or fragment thereof, to see if any single nucleotide 
polymorphisms exist at any one or more of the SNP sites within the 
following SNP subset; 

SNPs Nos. 1, 6, 7, 9, 11 and 14 (as described in Figure 2). 

Alternatively, the above method may simply comprise determining 
whether there is a single nucleotide polymorphism at SNP7. 

Ideally the diagnostic method of the invention comprises determining the 
nucleotide sequence of the GH1 gene, or part thereof, in said test sample by 
sequencing methods employing PCR amplification of the GH1 gene, or fragment 
thereof, using a nucleotide fragment that is specific for said GH1 gene, or a part 
thereof. 

Most ideally the diagnostic method of the invention involves PCR 
amplification of the proximal promoter region of the GH1 gene and subsequent 
analysis of the amplified material to determine whether a single nucleotide 
polymorphism exist at one or more of the SNP sites designated 1, 6, 7 f 9, 11 
and 14. 

In the instance where SNPs are identified at the aforementioned sites the 
diagnostic method is concluded by comparing the SNPs with published 
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information or that contained herein and determining whether a correlation 
means that there is a high likelihood that the individual being tested is under 
expressing growth hormone and is likely to suffer from known 
effects/complications associated with this physiology or over expressing growth 
hormone and so is likely to suffer from known effects/complications associated 
with this physiology. 

As previously mentioned, using the haplotype partitioning method 
described herein it is possible to determine a super-maximal and sub-minimal 
haplotype and therefore the invention, according to a further aspect, also 
comprises the identification of a super-maximal and/or sub-minimal haplotype for 
the grown hormone gene. 

The super-maximal haplotype (AGGGGTTAT-ATGGAG) is defined by a 
coding sequence that, in the embodiment described herein, enhances 
expression of growth hormone. Thus in the study herein described, the 
identification of proximal promoter SNPs in the grown hormone gene leads to 
increased promoter activity and so over expression of growth hormone. This in 
turn leads to an individual presenting a greater than average height. 
Conversely, the sub-minimal haplotype (AG-TTTTGGGGCCACT) defines a 
nucleotide sequence encoding a growth hormone gene whose promoter exhibits 
depressed activity and so overall production of growth hormone is reduced. An 
individual with this haplotype would be characterised by exhibiting a shorter than 
average height. 

According to a further aspect of the invention there is therefore described 
a variant of GH1 as described herein. 
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According to yet further aspects of the invention there is provided a 
screening method for screening an individual suspected of growth hormone 
dysfunction which screening method comprises the steps of : 

(a) obtaining a test sample from an individual to be tested which 
sample contains genetic material comprising, at least, the growth 
hormone gene GH1 or a fragment thereof; 

(b) sequencing said gene, or fragment thereof; 

(c) analysing said sequence for single nucleotide polymorphisms at 
any one of the following SNP sites within the proximal promoter (as 
illustrated in Figure 2 hereof) 1 , 6, 7, 9, 1 1 and 14; 

(d) comparing the identified single nucleotide polymorphisms with 
those in the published literature, or as described herein and determining 
whether a correlation means that said individual is likely to suffer from 
growth hormone dysfunction. 

Alternatively, the above screening method may simply involve analysing 
said sequence for a single nucleotide polymorphism at SNP7. 

Reference herein to growth hormone dysfunction includes reference to 
growth that is above or below average and requires knowledge thereof of 
treatment thereof by a clinician or patient. 

According to further aspect of the invention there is provided a kit for 
carrying out the aforementioned diagnostic and/or screening methods. Ideally 
said kit contains the means for sequencing said growth hormone gene, or a part 
thereof and, optionally, information concerning said SNP sites 1, 6, 7, 9, 11 and 
14 (as herein described). 



WCM-96 Statement of lnvention.doc 




-55- 



The investigations related herein, centered on the growth hormone gene, 
led to the surprising conclusion that, in addition to the above identified major 
SNPs, there existed three additional SNPs upstream from the growth hormone 
gene, which had a significant role to play in determining growth hormone 
dysfunction and thus the likely height of an individual. These three SNPs were 
found to be within sites I and II of the upstream Locus Control Region (LCR) of 
GH1. These were located at nucleotide positions 990G, 1144A, 1194C. We 
ascribed to these three additional SNPs three distinct haplotypes and we have 
discovered that these haplotypes enhance proximal promoter activity by up to 
2.8 fold depending upon the SNPs that exist in the proximal promoter, or put 
another way, depending on the haplotype of the proximal promoter. 

Accordingly therefore, the aforementioned methods of the invention may, 
additionally or alternatively, comprise: 

(a) obtaining a sample from an individual to be tested which sample 
comprises the genetic material encoding the Locus Control Region, or a 
part thereof, of the GH1 gene. 

(b) sequencing said Locus Control Region, or part thereof, in order to 
determine the genetic code thereof; 

(c) analysing said sequence to determine whether it contains any one 
or more of the three SNPs located within sites I and II (990G, 1144A, 
1194C) of the LCR region as described herein; and 

(d) where said SNPs are present concluding that the LCR region of 
the growth hormone gene is likely to enhance proximal promoter activity 
and so increase expression of growth hormone. 
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Altematively, said method may comprise simply analysing said sequence 
to determine whether a single nucleotide polymorphism exists at SNP7. 

It will be apparent to one skilled in the art that once one has concluded 
that growth hormone is over expressed an individual is likely to grow to have, or 
have, above average height. 

In an alternative aspect or embodiment of the invention the 
aforementioned methods may comprise, under step (d), concluding that where 
an individual has single nucleotide polymorphisms in any of the major 
determinant SNP sites in the proximal promoter and single nucleotide 
polymorphisms in said LCR region then said individual is likely to over express 
growth hormone and so grow to be, or be, above average height. 

According to a further aspect of the invention there is provided the use of 
a growth hormone variant, as herein described, and in particular the use of a 
haplotypes of the proximal promoter region of the growth hormone gene, in the 
manufacture of a composition to treat growth hormone dysfunction. Further, 
there is provided a composition containing said variant or haplotype. 

According to a further aspect of the invention there is provided the use of 
a LCR variant, as herein described, and in particular the use of a LCR haplotype 
in the manufacture of a composition to treat growth hormone dysfunction. 
Further there is provided a composition containing said LCR variant haplotype. 

According to yet a further aspect of the invention there is provided a 
combination of a growth variant and a LCR variant, as herein described, in the 
manufacture of a composition to treat growth hormone dysfunction. Further, 
there is provided a composition containing said growth hormone variant and said 
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LCR variant 

Moreover, there is provided use of one or more of the subset of the 
proximal promoter haplotypes of GH1 (SNPs 1, 6, 7, 9, 11 and 14) or the LCR 
haplotypes of GH1 to treat growth hormone dysfunction. 
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